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Abstract. Multi-dimensional mean-payoff and energy games provide the mathematical foundation for 
the quantitative study of reactive systems, and play a central role in the emerging quantitative theory 
of verification and synthesis. In this work, we study the strategy synthesis problem for games with 
such multi-dimensional objectives along with a parity condition, a canonical way to express oj-regular 
conditions. While in general, the winning strategies in such games may require infinite memory, for 
synthesis the most relevant problem is the construction of a finite-memory winning strategy (if one 
exists). Our main contributions are as follows. First, we show a tight exponential bound (matching 
upper and lower bounds) on the memory required for finite-memory winning strategies in both multi- 
dimensional mean-payoff and energy games along with parity objectives. This significantly improves 
the triple exponential upper bound for multi energy games (without parity) that could be derived 
from results in literature for games on VASS (vector addition systems with states) . Second, we present 
an optimal symbolic and incremental algorithm to compute a finite-memory winning strategy (if one 
exists) in such games. Finally, we give a complete characterization of when finite memory of strategies 
O ■ can be traded off for randomness. In particular, we show that for one-dimension mean-payoff parity 

games, randomized memoryless strategies are as powerful as their pure finite-memory counterparts. 
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Two-player games on graphs provide the mathematical foundation to study many important problems in 
computer science. Game-theoretic formulations have especially proved useful for synthesis [19,35,33], ver- 
ification [2], refinement [30], and compatibility checking [20] of reactive systems, as well as in analysis of 
| emptiness of automata [37]. 

Games played on graphs are repeated games that proceed for an infinite number of rounds. The state 
space of the graph is partitioned into player 1 states and player 2 states (player 2 is adversary to player 1). 
The game starts at an initial state, and if the current state is a player 1 (resp. player 2) state, then player 
1 (resp. player 2) chooses an outgoing edge. This choice is made according to a strategy of the player: given 
the sequence of visited states, a pure (resp. randomized) strategy chooses an outgoing edge (resp. probability 
distribution over outgoing edges). This process of choosing edges is repeated forever, and gives rise to an 
outcome of the game, called a play, that consists of the infinite sequence of states that are visited. 

Traditionally, games on graphs have been studied with Boolean objectives such as reachability, 
liveness, cj-regular conditions formalized as the canonical parity objectives, strong fairness objectives, 
etc [29,25,26,40,37,28]. While games with quantitative objectives have been studied in the game theory 
literature [24,41,32], their application in synthesis and other problems in verification is quite recent. The two 
classical quantitative objectives that are most relevant in verification and synthesis are the mean-payoff and 
energy objectives. In games on graphs with quantitative objectives, the game graph is equipped with a weight 
function that assigns integer-valued weights to every edge. For mean-payoff objectives, the goal of player 1 
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is to ensure that the long-run average of the weights is above a threshold. For energy objectives, the goal of 
player 1 is to ensure that the sum of the weights stays above at all times. In applications of verification and 
synthesis, the quantitative objectives that typically arise are (i) multi-dimensional quantitative objectives 
(i.e., conjunction of several quantitative objectives), e.g., to express properties like the average response time 
between a grant and a request is below a given threshold v\ , and the average number of unnecessary grants 
is below threshold v 2 \ and (ii) conjunction of quantitative objectives with a Boolean objective, such as a 
mean-payoff parity objective that can express properties like the average response time is below a threshold 
along with satisfying a liveness property. In summary, the quantitative objectives can express properties 
related to resource requirements, performance, and robustness; multiple objectives can express the different, 
potentially dependent or conflicting objectives; and the Boolean objective specifies functional properties such 
as liveness or fairness. The game theoretic framework of multi-dimensional quantitative games and games 
with conjunction of quantitative and Boolean objectives has recently been shown to have many applications 
in verification and synthesis, such as synthesizing systems with quality guarantee [5], synthesizing robust sys- 
tems [6], performance aware synthesis of concurrent data structure [11], analyzing permissivity in games and 
synthesis [9], simulation between quantitative automata [16], generalizing Boolean simulation to quantitative 
simulation distance [12], etc. Moreover, multi-dimensional energy games are equivalent to a decidable class of 
games on VASS (vector addition systems with states) that are the model to verify games over multi-counter 
systems and Petri nets [10]. 

In literature, there are many recent works on the theoretical analysis of multi-dimensional quantita- 
tive games, such as, mean-payoff parity games [18,9], energy-parity games [14], multi-dimensional energy 
games [17], and multi-dimensional mean-payoff games [17,39]. Most of these works focus on establishing 
the computational complexity of the problem of deciding if player 1 has a winning strategy. From the per- 
spective of synthesis and other related problems in verification, the most important problem is to obtain 
a witness finite-memory winning strategy (if one exists). The winning strategy in the game corresponds to 
the desired controller for (or implementation of) the system in synthesis, and for implementability a finite- 
memory strategy is essential. In this work we consider the problem of finite-memory strategy synthesis in 
multi-dimensional quantitative games in conjunction with parity objectives, and the problem of existence of 
memory-efficient randomized strategies for such games. These are the core and foundational problems in the 
emerging theory of quantitative verification and synthesis. 

Our contributions. In this work, we study for the first time multi-dimensional energy and mean-payoff 
objectives in conjunction with parity objectives. Conjunction of parity objectives with multi-dimensional 
quantitative objectives has not been considered before. Since we consider the synthesis of finite-memory 
strategies, it follows from the results of [17] that both the problems (multi-dimensional energy with parity 
and multi-dimensional mean-payoff with parity) are equivalent. Our main results for finite-memory strategy 
synthesis for multi-dimensional energy parity games are as follows, (i) Optimal memory bounds. We 
first show that memory of exponential size is sufficient in multi-dimensional energy parity games. Our 
result is a significant improvement over the result that can be obtained naively from the results known 
in literature that yields a triple exponential bound, even in the case of multi-dimensional energy games 
without parity. Second, we show a matching lower bound by presenting a family of game graphs where 
exponential memory is necessary in multi-dimensional energy games (without parity), even when all the 
transition weights belong to { — 1,0, +1}. Thus we establish optimal memory bounds for the finite-memory 
strategy synthesis problem, (ii) Symbolic and incremental algorithm. We present a symbolic algorithm 
(in the sense of [22], i.e., using a compact antichain representation of sets by their minimal elements) to 
compute a finite-memory winning strategy, if one exists, for multi-dimensional energy parity games. Our 
algorithm is parameterized by the range of energy levels to consider during its execution. So, we can use it 
in an incremental approach: first, we search for finite-memory winning strategies with a small range, and 
increment the range only when necessary. We also establish a bound on the maximal range to consider which 
ensures completeness of the incremental approach. In the worst case the algorithm requires exponential 
time. Since exponential size memory is required (and also the decision problem is coNP-complete [17]), the 
worst case exponential bound can be considered as optimal. Moreover, as our algorithm is symbolic and 
incremental, in most relevant problems in practice, it is expected to be efficient. We also consider when 
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the (pure) finite-memory strategies can be traded off for conceptually much simpler randomized strategies. 
(Hi) Randomized strategies. We show that for energy objectives randomization is not helpful (as energy 
objectives are similar in spirit with safety objectives), even with only one player, neither it is for two-player 
multi-dimensional mean-payoff objectives. However, randomized memoryless strategies suffice for one-player 
multi-dimensional mean-payoff parity games. For the important special case of mean-payoff parity objectives 
(conjunction of a single mean-payoff and parity objectives), we show that in games, finite-memory strategies 
can be traded off for randomized memoryless strategies. 

Related works. Games with a single mean-payoff objective have been studied in [24,41], and games with a 
single energy objective in [13]; their equivalence was established in [8]. One-dimensional mean-payoff parity 
games problem has been studied in [18] : an exponential algorithm was given to decide if there exists a winning 
strategy (which in general was shown to require infinite memory) ; and an improved algorithm was presented 
in [9]. One-dimensional energy parity games problem has been studied in [14]: it was shown that deciding the 
existence of a winning strategy is in NP n coNP, and an exponential algorithm was given. It was also shown 
in [14] that, for one-dimensional energy parity objectives, finite-memory strategies with exponential memory 
are sufficient, and the decision problem for mean-payoff parity objective can be reduced to energy parity 
objective. Games on VASS with several different winning objectives have been studied in [10], and from 
the results of [10] it follows that in multi-dimensional energy games, winning strategies with finite memory 
are sufficient (and a triple exponential bound on memory can be derived from the results). The complex- 
ity of multi-dimensional energy and mean-payoff games was studied in [17,39]. It was shown in [17] that 
in general, winning strategies in multi-dimensional mean-payoff games require infinite memory, whereas for 
multi-dimensional energy games, finite-memory strategies are sufficient. Moreover, for finite-memory strate- 
gies, the multi-dimensional mean-payoff and energy games coincide, and optimal computational complexity 
for deciding the existence of a winning strategy was established as coNP-complete [17,39]. Multi-dimensional 
mean-payoff games with infinite-memory strategies were studied in [39], and optimal computational com- 
plexity results were established. Various decision problems over multi-dimensional energy games were studied 
in [27]. 

2 Preliminaries 

We consider two-player game structures and denote the two players by V\ and V2 ■ 

Multi-weighted two-player game structures. A multi-weighted two-player game structure is a tuple 
G = (Si, 5*2, Si n n,E, k, w) where (i) Si and S2 resp. denote the finite sets of states belonging to V\ and V2, 
with Si C\S2 = 0; (ii) s init G £ = Si U S2 is the initial state; (iii) E C S x S is the set of edges s.t. for all s E S, 
there exists s' G S s.t. (s, s') G E; (iv) k G N is the dimension of the weight vectors; and (v) w. E — > Z k is 
the multi-weight labeling function. The game structure G is one-player if S2 = 0. A play in G is an infinite 
sequence of states 7r = s siS2 ■ ■ • s.t. s = s init and for all i > 0, we have (s,, s, + i) G E. The prefix up 
to the n-th state of play tt = s si . . . s n . . . is the finite sequence n(n) = s si . . . s n . Let First(7r(n)) and 
Last(7r(n)) resp. denote so and s„, the first and last states of n(n). A prefix 7r(n) belongs to Vi, i G {1, 2}, 
if Last(7r(n)) G Si. The set of plays of G is denoted by Plays(G) and the corresponding set of prefixes is 
denoted by Prefs(G). The set of prefixes that belong to Vi is denoted by PrefSj(G). The energy level vector 
of a sequence of states p = so s i ■ ■ ■ s n s.t. for all i > 0, we have (s^, Sj+i) G E, is EL(p) = X^l=o 1 w ( s i> s i+i) 
and the mean-payoff vector of a play n = s si ... is MP(7r) = liminf^oo ^EL(n(n)). 

Parity. A game structure G is extended with a priority function p : S — > N to G p = (Si, S2, Si n u, E, k, w,p). 
Given a play n = S0S1S2 . . . , let lnf(7r) = {seS\\/m>0,3n>m s.t. s n = s} denote the set of states that 
appear infinitely often along tt. The parity of a play tt is defined as Par(7r) = min{p(s) | s G lnf(7r)}. In the 
following definitions, we denote any game by G p with no loss of generality. 

Strategies. Given a finite set A, a probability distribution on A is a function p: A i-> [0, 1] s.t. X)aG^P( a ) = 1- 
We denote the set of probability distributions on A by T>(A). A pure strategy for Vi, i G {1,2}, in 
G p is a function Xi : Prefsi(G p ) — > S s.t. for all p G Prefsi(G p ), we have (Last(p), Xi(p)) G E. A (be- 
havioral) randomized strategy is a function : Prefsi(G p ) — > V(S) s.t. for all p G Prefsi(G p ), we have 
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{(Last(p), s) | s G S, Aj(p)(s) > 0} C E. A pure strategy Ai for Vi has finite- memory if it can be encoded by 
a deterministic Moore machine (M, too , a u , a„ ) where M is a finite set of states (the memory of the strategy) , 
to G M is the initial memory state, a„:MxS^Misan update function, and a n : M x Si — > 5 is the 
next-action function. If the game is in s G Sj and to G M is the current memory value, then the strategy 
chooses s' — a n (m,s) as the next state of the game. When the game leaves a state s G S, the memory is 
updated to a u (m,s). Formally, (M, mo, a u , a n ) defines the strategy Aj s.t. Xi(p-s) = a n (a u (m , p), s) for all 
p G S* and s G Si, where d u extends a u to sequences of states as expected. A pure strategy is memoryless if 
\M\ = 1, i.e., it does not depend on history but only on the current state of the game. Similar definitions hold 
for finite-memory randomized strategies, s.t. the next-action function a n is randomized, while the update 
function a u remains deterministic. We resp. denote by Ai, Af F , Af M , Af M the sets of general (i.e., pos- 
sibly randomized and infinite- memory) , pure finite- memory, pure memoryless and randomized memoryless 
strategies for player V%. 

Given a prefix p G Prefsi(G p ) belonging to player Vi, and a strategy Ai G Ai of this player, we define the 
support of the probability distribution defined by Ai as Supp A . (p) = {s G S \ Xi(p)(s) > 0}, with Xi(p)(s) = 1 
if Ai is pure and Xi(p) = s. A play tt is said to be consistent with a strategy Ai of Vi if for all n > s.t. 
Last(7r(n)) G Si, we have Last(7r(n + 1)) G Supp A . (ir(n)). Given two strategies, Ai for V\ and A2 for V 2 , we 
define Outcome Gp (Ai, A 2 ) — {tt G Plays(G p ) | tt is consistent with Ai and A2}, the set of possible outcomes 
of the game. Note that if both strategies Ai and A 2 are pure, we obtain a unique play tt = s sis 2 ■ ■ ■ s.t. for 
ah j > 0, i G {1, 2}, if Sj G Si, then we have s J+1 = Xi(sj). 

Given the initial state Si n it and strategies for both players Ai G A\, X2 G A 2 , we obtain a Markov chain. 
Thus, every event A C Plays(G p ), a measurable set of plays, has a uniquely defined probability [38]. We 
denote by Ps/^ 2 (-4.) the probability that a play belongs to A when the game starts in Si n u and is played 
consistently with Ai and A 2 . Let / : Plays(G p ) — > M be a measurable function, we denote E^^ 2 (/) the 
expected value of function / over a play when the game starts in s init and is played consistently with Ai 
and A 2 . We use the same notions for prefixes by naturally extending them to their infinite counterparts. 

Objectives. An objective for V\ in G p is a set of plays <j) C Plays(G p ). We consider several kinds of objectives: 

— Multi Energy objectives. Given an initial energy vector v G N fe , the objective PosEnergy Gp (u ) = 
{tt G Plays(Gp) | Vn > : v n + EL(7r(n)) G N fc } requires that the energy level in all dimensions stays 
positive at all times. 

— Multi Mean-payoff objectives. Given a threshold vector v G Q k , the objective MeanPayoff Gp (u) = 
{n G Plays(Gp) | MP(7r) > v} requires that for all dimension j, the mean-payoff on this dimension is 
at least v(j). 

— Parity objectives. Objective Parity Gp = {tt G Plays(G p ) | Par(7r) mod 2 = 0} requires that the minimum 
priority visited infinitely often be even. When the set of priorities is restricted to {0, 1}, we have a Biichi 
objective. Note that every multi-weighted game structure G without parity can trivially be extended to 
G p with p : S ->• {0}. 

— Combined objectives. Parity can naturally be combined with multi mean-payoff and multi energy objec- 
tives, resp. yielding MeanPayoff^ (v) n Parity Gp and PosEnergy Gp (u ) H Parity Gp . 

Sure, satisfaction and expectation semantics. A strategy Ai for V\ is surely winning for an objective 
4> in G p if for all plays tt G Plays(G p ) that are consistent with Ai, we have tt G 4>. When at least one of the 
players plays a randomized strategy, the notion of sure winning in general is too restrictive and inadequate, 
as the set of consistent plays that do not belong to <p may have zero probability measure. Therefore, it is 
useful to use satisfaction or expectation criteria. Let Ai G A\ be the strategy of Pi. 

— Given a threshold a G [0,1] and a measurable objective (j) C Plays(G p ), a- satisfaction asks that for all 
A 2 E A 2 , we have P^ i A t 2 (0) > a. If Ai satisfies <p with probability a = 1, we say that Ai is almost- surely 
winning for <j) in G p . 

— Given a threshold (3 G Q fe , a function / : Plays(G p ) — > Q, (3- expectation asks that for all A 2 G A 2 , we 
haveE^(/) >/3. 
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Note that energy objectives are naturally more enclined towards satisfaction semantics, as they model safety 
properties. 

Strategy synthesis problem. For multi energy parity games, the problem is to synthesize a finite initial 
credit w G N fc and a pure finite-memory strategy A^ G A PF that is surely winning for V\ in G p for 
the objective PosEnergy Gp (uo) H Parity Gp , if one exists. So, the initial credit is not fixed, but is part of 
the strategy to synthesize. For multi mean-payoff games, given a threshold v G Q k , the problem is to 
synthesize a pure finite-memory strategy G /If F that is surely winning for V\ in G p for the objective 
Mean Payoff,^ (v) n Parity Gp , if one exists. Note that multi energy and multi mean-payoff games are equivalent 
for finite-memory strategies, while in general, infinite memory may be necessary for the latter [17]. 
Trading finite memory for randomness. We study when finite memory can be traded for randomization. 
The question is: given a strategy A^ G Af F which ensures surely winning of some objective (f>, does there 
exist a strategy A™ G /If M which ensures almost-surely winning for the same objective <\P. For mean-payoff 
objectives, one can also ask for a weaker equivalence, that is: can randomized memoryless strategies achieve 
the same expectation as pure finite-memory ones? 

3 Optimal memory bounds 

In this section, we establish optimal memory bounds for pure finite-memory winning strategies on multi- 
dimensional energy parity games (MEPGs). Also, as a corollary, we obtain results for pure finite-memory 
winning strategies on multi-dimensional mean-payoff parity games (MMPPGs). We show that single expo- 
nential memory is both sufficient and necessary for winning strategics. Additionally, we show how the parity 
condition in a MEPG can be removed by adding additional energy dimensions. 

Multi energy parity games. A sample game is depicted on Fig. 1. The key point in the upper bound 
proof on memory is to understand that for V\ to win a multi energy parity game, he must be able to force 
cycles whose energy level is positive in all dimensions and whose minimal parity is even. As stated in the 
next lemma, finite-memory strategies are sufficient for multi energy parity games for both players. 

Lemma 1 (Extension of [17, Lemma 2 and 3]). IfV\ wins a multi energy parity game, then he has a 
pure finite-memory winning strategy. If V2 wins a multi energy parity game, then he has a pure memoryless 
winning strategy. 

Proof. The first part of the result follows using the standard well-quasi ordering argument (straightforward 
extension of [17, Lemma 2]). The second part follows by the classical edge induction argument: Lemma 3 of 
[17] and Lemma 3 of [14] show the result using edge induction for multi energy and energy parity games, 
respectively. Repeating the arguments of Lemma 3 of [14], and replacing the part on single energy objectives 
by the argument of Lemma 3 of [17] for multi energy objectives, we obtain the desired result. □ 

By Lemma 1, we know that w.l.o.g. both players can be restricted to play pure finite memory strategies. 
The property on the cycles can then be formalized as follows. 

Lemma 2. Let G p = (Si, S2, s init , E, k, w,p) be a multi energy parity game. Let \\ f G A PF be a winning 
strategy ofVi for initial credit vq G N fe . Then, for all \ v 2 m G A PM , the outcome is a regular play it = p^oo)" > 
with p G Prefs(G) , r/oo G S + , s.t. EL(rjQc) > and Pariji) = min{p(s) | s G r/oc} is even. 

Proof. Recall that both players play with pure finite memory strategies. Therefore, a finite number of deci- 
sions are made and the outcome is a regular play 7r = p- (r/oo)". Note that EL(p) does not have to be positive, 
as Vi may have vq > EL(p). Similarly, priorities of states visited in p have no impact on winning as they are 
only visited a finite number of times. 

First, suppose £1(7700) < on some dimension 1 < j < k. Then, after m > cycles, for some n > 0, the 
energy level will be EL(7r(n)) = EL(/j • (r/oo)™) = EL(p) + m • EL(r7oo). Since vq is finite and m — > 00, there 
exist some m, n > 0, s.t. vq + EL(7r(n)) < on dimension j and Ai is not winning. 

Second, suppose mm {p(s) \ s G ??oo} is odd. Since the set of states visited infinitely often is exactly the 
set of states in 7700, this implies that Par(7r) is odd, and thus Ai is not winning. □ 
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Fig. 1. Two-dimensional energy parity game and even-parity self-covering tree representing an arbitrary finite-memory 
winning strategy. Circle states belong to Pi, square states to V2- 



With the notion of regular play of Lemma 2, we generalize the notion of self-covering path to include the 
parity condition. We show here that, if such a path exists, then the lengths of its cycle and the prefix needed 
to reach it can be bounded. Bounds on the strategy follow. In [34], Rackoff showed how to bound the length of 
self-covering paths in Vector Addition Systems (VAS). This work was extended to Vector Addition Systems 
with States (VASS) by Rosier and Yen [36]. Recently, Brazdil et al. introduced reachability games on VASS 
and the notion of self-covering trees [10]. Their Zero-safety problem with w initial marking is equivalent to 
multi energy games with weights in {—1, 0, 1}, and without the parity condition. They showed that if winning 
strategies exist for Vi, then some of them can be represented as self-covering trees of bounded depth. Trees 
have to be considered instead of paths, as in a game setting all the possible choices of the adversary (P 2 ) 
must be considered. Here, we extend the notion of self-covering trees to even-parity self-covering trees, in 
order to handle parity objectives. 

Even-parity self-covering tree. An even-parity self-covering tree (epSCT) for s G S is a finite tree 
T = (Q, R), where Q is the set of nodes, 0: Q ^ S x Z fc is a labeling function and R C Q x Q is the set of 
edges, s.t. 

- The root of T is labeled (s, (0, . . . , 0)). 

- If <r G Q is not a leaf, then let <9(^) = (t,u) , t G S , u G Z fe , s.t. 

• if t G Si, then ? has a unique child i? s.t. <9(i9) = (f, u'), (t, t') G E and u' = u + w(t, t'); 

• if t G S2, then there is a bijection between children of <r and edges of the game leaving t, s.t. for each 
successor t' G S of t in the game, there is one child 1? of <r s.t. 0(d) — (t' , u'), u' = u + w(t, t'). 

- If s is a leaf, then let <9(<^) = (t, u) s.t. there is some ancestor •& of <r in T s.t. 0(d) = (t, u'), with u' < u, 
and the downward path from 1? to <;, denoted by $ ~» <r, has minimal priority even. We say that d is an 
even-descendance energy ancestor of <;. 

Intuitively, each path from root to leaf is a self-covering path of even parity in the game graph so that 
plays unfolding according to such a tree correspond to winning plays of Lemma 2. Thus, the epSCT fixes 
how Vi should react to actions of V2 in order to win the MEPG (Fig. 1). Note that as the tree is finite, one 
can take the largest negative number that appears on a node in each dimension to compute an initial credit 
for which there is a winning strategy (i.e., the one described by the tree). In particular, let W denote the 
maximal absolute weight appearing on an edge in G p . Then, for an epSCT T of depth I, it is straightforward 
to see that the maximal initial credit required is at most I ■ W as the maximal decrease at each level of 
the tree is bounded by W. We suppose W > as otherwise, any strategy of Vi is winning for the energy 
objective, for any initial credit vector v G N fe . 



Let us explicitely state how V\ can deploy a strategy \\ £ A PF based on an cpSCT T = (Q,R). We 
refer to such a strategy as an epSCT strategy. It consists in following a path in the tree T, moving a pebble 
from node to node and playing in the game depending on edges taken by this pebble. Each time a node c 
s.t. 0(c) = (t, u) is encountered, we do the following. 

— If c is a leaf, the pebble directly goes up to its oldest even-descendance energy ancestor ■&. By oldest we 
mean the first encountered when going down in the tree from the root. Note that this choice is arbitrary, 
in a effort to case following proof formulations, as any one would suit. 

— Otherwise, if q is not a leaf, 

• if t £ S2 and Vi plays state t' £ S, the pebble is moved along the edge going to the only child 1? of <r 
s.t. = (t',u'), u' = u + w(t,t'); 

• if t £ Si, the pebble moves to = (t', u'), the only child of <r, and V\ strategy is to choose the 
state t' in the game. 

If such an epSCT T of depth I exists for a game G p , then V\ can play the strategy Xf £ A PF to win the 
game with initial credit bounded by I ■ W. 

Bounding the depth of epSCTs. Consider a multi energy game without parity. Then, the priority 
condition on downward paths from ancestor to leaf is not needed and self-covering trees (i.e., epSCTs without 
the condition on priorities) suffice to describe winning strategies. One can bound the size of SCTs using 
results on the size of solutions for linear diophantine equations (i.e., with integer variables) [7]. In particular, 
recent work on reachability games over VASS with weights {—1,0, 1}, Lemma 7 of [10], states that if V\ 
has a winning strategy on a VASS, then he can exhibit one that can be described as a SCT whose depth 
is at most I = 2( d ~ 1 )'l 5 l • (IS*! + l) c ' k , where c is a constant independent of the considered VASS and d its 
branching degree (i.e., the highest number of outgoing edges on any state). Naive use of this bound for multi 
energy games with arbitrary integer weights would induce a triple exponential bound for memory. Indeed, 
recall that W denotes the maximal absolute weight that appears in a game G p = (Si, S2, Si n a, E, k, w,p). A 
straightforward translation of a game with arbitrary weights into an equivalent game that uses only weights 
in { — 1,0,1} induces a blow-up by W in the size of the state space, and thus an exponential blow-up by W 
in the depth of the tree, which becomes doubly exponential as we have 

where V denotes the number of bits used by the encoding of W. Moreover, the width of the tree increases 
as d l , i.e., it increases exponentially with the depth. So straight application of previous results provides an 
overall tree of triple exponential size. In this paper we improve this bound and prove a single exponential 
upper bound, even for multi energy parity games. We proceed in two steps, first studying the depth of the 
epSCT, and then showing how to compress the tree into a directed acyclic graph (DAG) of single exponential 
size. 

Lemma 3. Let G p = (Si, S%, Si n n, E,k,w,p) be a multi energy parity game s.t. W is the maximal absolute 
weight appearing on an edge and d the branching degree of G p . Suppose there exists a finite-memory winning 
strategy for Vi. Then there is an even-parity self-covering tree for Smit of depth at most I = 2( d ~ 1 )'l' s l • 
(W ■ \S\ + l) c ' , where c is a constant independent of G p . 

Lemma 3 eliminates the exponential blow-up in depth induced by a naive coding of arbitrary weights 
into { — 1,0, 1} weights, and implies an overall doubly exponential upper bound. Our proof is a generalization 
of [10, Lemma 7], using a more refined analysis to handle both parity and arbitrary integer weights. The 
idea is the following. First, consider the one-player case. The epSCT is reduced to a path. By Lemma 2, it 
is composed of a finite prefix, followed by an infinitely repeated sequence of positive energy level and even 
minimal priority. The point is to bound the length of such a sequence by eliminating cycles that are not 
needed for energy or parity. Second, to extend the result to two-player games, we use an induction on the 
number of choices available for V2 in a given state. Intuitively, we show that if Vi can win with an epSCT 
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Ta when V2 plays edges from a set A in a state s, and if he can also win with an epSCT Tg when V2 plays 
edges from a set B, then he can win when V2 chooses edges from both A and B, with an epSCT whose depth 
is bounded by the sum of depths of Ta and Tg. 

Proof. The proof is made in two steps. First, we consider the one-player case, where S2 = 0. Second, we use 
an induction scheme over the choice degree of V2 to extend our results to the two-player case. 

We start with S2 — 0, the one-player game. By Lemma 2, a winning play is of the form n — p ■ 77^ s.t. 
£1(7700) > anc l Par(7r) = min{p(s) | s £ 7700} is even. Notice that such a play corresponds to the epSCT 
defined above, as it reduces to an even-parity self-covering path (s initl (0, . . . , 0)) ~+ (s, u) ~-+ (s,u') with 
v! > u. Therefore its existence is guaranteed and it remains to bound its length. Given such a path, the idea 
is to eliminate unnecessary cycles, in order to reduce its length while maintaining the needed properties (i.e., 
positive energy and even minimal priority). First, notice that cycles in the sub-path (si n it, (0, . . . , 0)) (s, u) 
can be trivially erased as they are only visited a finite number of times and thus (a) the initial credit can 
compensate for the loss of their potential positive energy effect, and (b) they do not contribute in the parity. 
Now consider the sub-path (s, u) ~» (s, u'). Since it induces a winning play, its minimal priority is even. Let 
p m be this priority. We may suppose w.l.o.g. that p(s) — p m , otherwise it suffices to shift this sub-path to 
(s',v) {s',v') for some state s' s.t. p(s') — p m and v' > v, and add the sub-path (s,u) ~^ (•?',«} to the 
finite prefix. Now we may eliminate each cycle of (s, u) ~~> (s, u') safely in regards to the parity objective as 
they only contain states with greater or equal priority. Thus, we only need to take care of the energy, and 
fall under the scope of [10, Lemma 15] for the special case of weights in {—1,0, 1}, where an upper bound 

c k 2 

h (\S\, k) = (\S\ + l) c on the length of such a path is shown. 

We claim that for a one-player game G, with weights in {— W, —W + 1, . . . , W — 1, W}, an upper bound 

k 2 

h (W, \S\,k) = (W ■ \S\ + If is obtained. Indeed, one can translate G p — (Si, S2, Sj„jt, E,k,w,p) into an 
equivalent game G' , — (S[, S2, s initl E' , k, w',p') s.t. each edge of G p is split into at most W edges in G' pl , 
with at most (W—l) dummy states in between, so that each edge of G' , only uses weights in {—1, 0, 1}. Let Sd 
denote the set of these added dummy states. We define this translation Tr: G p n> G' , with Tr(5i) = Si L)Sd, 

Tr(5 2 ) = S 2 , Tr(s init ) = s init , Tr(E) = (J(s.t)eE Tr ((^, *)), Tr(fc) - k, Tr(w) = w' : E' -> {-l,0,l} fc , 
Tr(p) = p' : S' -»• N s.t. 

V (s,t) eE,m = max{w;(s,7j)(j) | 1 < j < k} - 1, 

Tr ((s,t)) = { (s, s d ), (s d ,s 2 d ),..., (s^-\s^), , t) } , 
s.t.Vj>0, s 3 d eS d A p'(s d )=p(s), 
A w'(q,r) =w(s,t). 

(9,r)eTr(( s ,t)) 

To be formally correct, we have to add that for all s d £ S d , we have degree in (s d ) = degree out (s d ) = 1, 
and for all s £ S d , we have p'(s) = p(s). This translation does not hinder the outcome of the game as each 
edge in G p has a unique corresponding path in G' , that preserves the weights and the visited priorities, and 
that offers no added choice to V\. Since G p possesses \E\ < \S\ 2 edges, and for each edge of G p , we add at 
most (W - 1) dummy states in G' pl , we have < |5| + l^l 2 • (W - 1) < \S\ 2 ■ W. Therefore, by applying 
[10, Lemma 15] on G' pl , we obtain the following upper bound: 

h (W, \S\, k) = h (\S'\,k) - (|5| 2 • W + I)"" 2 = (W ■ \S\ + l) c '' fc2 

for some constant c' that is independent of G p . 

Now, consider ^2 7^ 0. (I) We extend [10, Lemma 16] for parity. This will help us to establish an induction 
scheme over the choice degree of Vi. Suppose s £ S2 has more than one outgoing edge. Let r = (s,t) £ E be 
one of them and R C E denote the nonempty set of other outgoing edges. Let G p (resp. G p ) be the game 
induced when removing R (resp. r) from G p . Suppose that (a) s is winning for Vi in G p for initial credit 
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«s £ N*, and (b) there exists some state s' e S s.t. s' is winning for V\ in G p for initial credit v T G N fe . We 
claim that s' is winning in G p for initial credit vq = v T + vr. Indeed, let XI and Af resp. denote winning 
strategies for V\ in G p and Gp. Let Pi use the following strategy. Player V\ plays \\ as long as P2 does 
not play any edge of R. If such an edge is played, then V\ switches to strategy Af and plays it until edge 
t is played again by V2, in which case V\ switches back to X[, and so on. In this way, the outcome of the 
game is guaranteed to be a play ir — s' . . . s . . . s . . . s . . . resulting from a merge between a play consistent 
with A[ over Gp (whose energy level is bounded by — v T at all times), and a play consistent with Af over 
Gp (whose energy level is bounded by — vr at all times). Therefore, the combined overall energy level of any 
prefix p of this play is bounded by (— v T — vr) as positive cycles in Gp and Gp do remain positive in G p . 
Furthermore, the parity condition is preserved in G p . Indeed, suppose it is not. Thus, there exists a state 
visited infinitely often in the outcome s.t. its priority is minimal and odd. However, as the outcome results 
from merging plays resp. consistent with A[ and Af , this implies that one of those strategies yields an odd 
minimal priority, which contradicts the fact that they are winning. This proves the claim. 

(II) We apply the induction scheme of [10, Lemma 18] onr = \{(s,t) G E \ s G S 2 }| - |S 2 | < (d- 1) ■ |S|, 
the choice degree of Vi- Notice that our translation Tr: G p i-> G' p , maintains this choice degree unchanged. 
The claim is that for a winning state s' , there is an epSCT of depth bounded by 2 r • h(W, |S|, k). We have 
proved that for the base case r = 0, similar to S2 = 0, this claim is true. So assume it holds for r, it remains 
to prove that it is preserved for r + 1. Let s G S 2 be s.t. V2 has at least two outgoing edges. As before, we 
define Gp and Gp. Clearly, the choice degree of V2 is at most r in both games. Let s' be a winning state 
in Gp. As V2 has less choices in both G T V and Gp, clearly s' is still winning in those games. If an epSCT in 
either of them (which are guaranteed to exist and have depth bounded by 2 r ■ h(W, |S|, k) by hypothesis) do 
not contain the state s, then the claim is verified. Now suppose we have two epSCTs for games Gp and Gp 
s.t. they both contain state s. Notice that s is winning in those two games and as such, is the root of two 
respective epSCTs of depth less than 2 r ■ h(W, |S|, k). Applying (I) on states s' and s, we get an epSCT for 
s' in Gp of depth 2 • 2 r • h(W, \S\,k), which concludes the proof. □ 

From multi energy parity games to multi energy games. Let G p be a MEPG and assume that V\ 
has a winning strategy in that game. By Lemma 3, there exists an epSCT whose depth is bounded by /. 
As a direct consequence of that bounded depth, we have that V\, by playing the strategy prescribed by the 
epSCT, enforces a stronger objective than the parity objective. Namely, this strategy ensures to "never visit 
more than I states of odd priorities before seeing a smaller even priority" (which is a safety objective) . Then, 
the parity condition can be transformed into additional energy dimensions. 

While our transformation shares ideas with the classical transformation of parity objectives into safety 
objectives, first proposed in [4] (see also [23, Lemma 6.4]), it is technically different because energy levels 
cannot be reset (as it would be required by those classical constructions). The reduction is as follows. For 
each odd priority, we add one dimension. The energy level in this dimension is decreased by 1 each time 
this odd priority is visited, and it is increased by I each time a smaller even priority is visited. If V\ is able 
to maintain the energy level positive for all dimensions (for a given initial energy level), then he is clearly 
winning the original parity objective; on the other hand, an epSCT strategy that wins the original objective 
also wins the new game. 

Lemma 4. LetG p — (Si, S2, Sj n jt, E, k, w,p) be a multi energy parity game with priorities in {0,1, ... ,2-m}, 
s.t. W is the maximal absolute weight appearing on an edge. Then we can construct a multi energy game G 
with the same set of states, (k + m) dimensions and a maximal absolute weight bounded by I, as defined by 
Lemma 3, s.t. V\ has a winning strategy in G iff he has one in G p . 

Proof. Let G p = (Si, S2,Si n it,E,k,w,p) be a MEPG with priorities in {0,1,..., 2 • m}. Let G = 
(Si,S2,Si n it,E,(k + m),w') be the MEG obtained from the following transformation: V (s,t) E E, 
V 1 < j < k, w'((s,t))(j) = w((s,t))(j), and (a) if p{t) is even, V k < j < 2® w '(( s ,t))(j) = and 
Vp(t) < j < k + to, w'((s,t))(j) = I, or (b) if p(t) is odd, V k < j < k + m, j ^ w'((s,t))(j) = and 
w/((s,i))(£rp) = —1. We have to prove both ways of the equivalence. 
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First, suppose Ai G A PF is a winning strategy for V\ in the MEPG G p . By Lemma 3, there is an epSCT 
of depth at most I for s init . Thus, we know that in every repeated sequence of I states, the minimal visited 
priority will be even. Therefore, for all additional dimensions, ranging from k + 1 to k + to, the effect of a 
sequence of I states will be bounded from below by — 1 • (I — 1) + I, which is positive. Thus strategy Ai is 
also winning in G (with initial credit bounded by I on additional dimensions). 

Second, suppose Ai G /If F is a winning strategy for V\ in the MEG G, as defined above. Since Ai is 
winning, it yields a SCT (epSCT without the parity condition) of bounded depth s.t. V\ is able to enforce 
positive energy cycles. By definition of weights over G, this cannot be the case if the minimal priority 
infinitely often visited is odd. Thus this strategy is winning for parity on G p , and stays winning for energy 
over dimensions 1 to k as weights are unchanged. □ 

Bounding the width. Thanks to Lemma 4, we continue with multi energy games without parity. In order 
to bound the overall size of memory for winning strategies, we consider the width of self-covering trees. The 
following lemma states that SCTs, whose width is at most doubly exponential by application of Lemma 3, 
can be compressed into directed acyclic graphs (DAGs) of single exponential width. Thus we eliminate the 
second exponential blow-up and give an overall single exponential bound for memory of winning strategies. 

Lemma 5. Let G = (S\, S2, Sjmti E,k,w) be a multi energy game s.t. W is the maximal absolute weight 
appearing on an edge and d the branching degree of G. Suppose there exists a finite-memory winning strategy 
for V\ . Then, there exists Af G A PF a winning strategy for V\ described by a DA G D of depth at most 

I = 2( d - 1 )-l s l • (W ■ \S\ + l)°' k2 and width at most L = \S\ ■ (2 • I ■ W + l) k , where c is a constant independent 
of G. Thus the overall memory needed to win this game is bounded by the single exponential I ■ L. 




Fig. 2. Merge between comparable nodes. 



The sketch of this proof is the following. By Lemma 3, we know that there exists a tree T, and thus a 
DAG, that satisfies the bound on depth. We construct a finite sequence of DAGs, whose first element is T, 
so that (1) each DAG describes a winning strategy for the same initial credit, (2) each DAG has the same 
depth, and (3) the last DAG of the sequence has its width bounded by |5| • (2 • l-W + 1)*=. This sequence 
D = T, Di,D 2 , . . • , D n is built by merging nodes on the same level of the initial tree depending on their 
labels, level by level. The key idea of this procedure is that what actually matters for V\ is only the current 
energy level, which is encoded in node labels in the self-covering tree T. Therefore, we merge nodes with 
identical states and energy levels: since V\ can essentially play the same strategy in both nodes, we only 
keep one of their subtrees. Full proof is in appendix (A.l). 
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It is possible to further reduce the practical size of the compressed resulting DAG by merging nodes 
according to a "greater or equal" relation over energy levels rather than simply equality (Fig. 2). This 
improvement is part of the algorithm that follows, and it has a significant impact on the practical width of 
DAGs as it can then be bounded by the number of incomparable labeling vectors instead of unequivalent 
ones. 



Lower bound. In the next lemma, we show that the upper bound is tight in the sense that there exist 
families of games which require exponential memory (in the number of dimensions), even for the simpler 
case of multi energy objectives without parity and weights in { — 1, 0, 1} (Fig. 3). Note that for one-dimension 
energy parity, it was shown in [14] that exponential memory may be necessary (in the encoding of weights). 



Lemma 6. There exists a family of multi energy games (G(K))k>i, = (Si, S2, Si n it, E, k ■ 
w : E — > { — 1,0, 1}) s.t. for any initial credit, V\ needs exponential memory to win. 



2-K, 



The idea is the following: in the example of Fig. 3, if V\ does not remember the exact choices of Vi 
(which requires an exponential size Moore machine), there will exist some sequence of choices of Vi s.t. V\ 
cannot counteract a decrease in energy. Thus, by playing this sequence long enough, Vi can force Vi to lose, 
whatever his initial credit is. 





Sl,L 
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Fig. 3. Family of games requiring exponential memory: V 1 < i < K, V 1 < j < fc, w((si, Si z L))(j) = 1 if j = 2 • i — 1, 
= — 1 if j = 2 • i, and = otherwise; w((s 1 ,s 1 ,l)) = -w((s;, = w((ti,t i:L )) = —w((t i: t itR )); w((o, Si)) = 
w((o,U)) = (0,...,0). 

Proof. We define a family of games (G(K))k>i which is an assembly of k = 2 • K gadgets, the first K 
belonging to Vi, and the remaining K belonging to Vi (Fig. 3). Precisely, we have |5i| = \Si\ = 3 • K, 
\S\ = \E\ = 6 • K = 3 • k (linear in k), k = 2 • K, and w defined as: 

VI < * < K, w((o, Si )) = w((o, U)) = (0, . . . , 0), 

V)((Si,8i,L)) = -w((si, s itR )) = w((ti,Ux)) = -w((U,U,r)), 

!1 if j = 2 • i - 1 
-lifj = 2-i 
otherwise 

where o denotes any valid predecessor state. 

There exists a winning strategy for Vi, for initial credit Vq XP = (1, . . . , 1). Indeed, for any strategy 
of V2, for any state U belonging to Vi, it suffices to play the opposite choice as Vi made on its last visit of Sj 
to maintain at all times an energy vector which is positive on all dimensions. This strategy thus requires to 
remember the last choice of Vi in all gadgets, which means V\ needs K bits to encode these decisions. Thus, 
this winning strategy is described by a Moore machine containing 2 K — 2? states, which is exponential in 
the number of dimensions k. 
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We claim that, for any initial credit vq, there exists no winning strategy Ai that can be described with 
less than 2 K states and prove it by contradiction. Suppose V\ plays according to such a strategy Ai. Then 
there exists some 1 < x < K s.t. Ai(si . . .s x s Xi l ■■■t x ) = Xi(si . . .s x s Xt D ■ ■ -t x ), i-e., V\ chooses the same 
action in t x against both choices of the adversary. Suppose w.l.o.g. that V\ chooses to play t Xt L in both cases, 
that is Ai(si . . . s x s x _l . ..t x ) — Xi(s\ . . .s x s Xi d ...t x ) = t x> L- By playing s x _l, V2 can force a decrease of the 
energy vector by 2 on dimension 2 • x every visit in gadget x. Similarly, if the strategy of V\ is to play t X) jj, 
■p2 wins by choosing to play s Xj ij as dimension 2 • x — 1 decreases by 2 every visit. Therefore, whatever the 
finite initial vector of V\ , V2 can enforce a negative dimension by playing long enough. This contradicts the 
fact that Ai is winning and concludes our proof that exponential memory is necessary for this simple family 
of games (G(K))k>i- □ 

We summarize our results in Theorem 1. 

Theorem 1 (Optimal memory bounds). The following assertions hold: (1) In multi energy parity games, 
if there exists a winning strategy, then there exists a finite-memory winning strategy. (2) In multi energy 
parity and multi mean-payoff games, if there exists a finite-memory winning strategy, then there exists a 
winning strategy with at most exponential memory. (3) There exists a family of multi energy games (without 
parity) with weights in { — 1,0,1} where all winning strategies require at least exponential memory. 

Proof. Thanks to [17, Theorem 3], we have equivalence between finite-memory winning for multi energy and 
multi mean- payoff games. The rest follows from straigthforward application of Lemma 1, Lemma 4, Lemma 
5, and Lemma 6. □ 



4 Symbolic synthesis algorithm 

We now present a symbolic, incremental and optimal algorithm to synthesize a finite-memory winning strategy 
in a MEG. 4 This algorithm outputs a (set of) winning initial credit(s) and a derived finite-memory winning 
strategy (if one exists) which is exponential in the worst-case. Its running time is at most exponential. So 
our symbolic algorithm can be considered (worst-case) optimal in the light of the results of previous section. 

This algorithm computes the greatest fixed point of a monotone operator that defines the sets of winning 
initial (vectors of) credits for each state of the game. As those sets are upward-closed, they are symbolically 
represented by their minimal elements. To ensure convergence, the algorithm considers only credits that are 
below some threshold, noted C. This is without giving up completeness because, as we show below, for a 
game G = (Si, S2, Si n n,E, k, w), it is sufficient to take the value 2 • I ■ W for C, where / is the bound on the 
depth on cpSCT obtained in Lemma 3 and W is the largest absolute value of weights used in the game. We 
also show how to extract a finite state Moore machine from this set of minimal winning initial credits and 
how to obtain an incremental algorithm by increasing values for the threshold C starting from small values. 

A controllable predecessor operator. Let G = (Si, S2, s»mt) E,k,w) be a MEG, C G N be a constant, 
and U(C) be the set (S1US2) x {0, l,...,C} fe . Let U(C) = 2 U( - C \ i.e., the powerset of U(C), and the operator 
Cpre c : U(C) — > U(C) be defined as follows: 

S(V) = {(si,ei) € (7(C) I si G SiA3(si,s) G E,3(s,e 2 ) G V : e 2 < ei +w(si,s)}, 
A(V) = {(s 2 , e 2 ) G (7(C) I s 2 G & A V(s 2 , s) G E, 3(s, ei) G V : ei < e 2 + w(s 2 ,s)}, 

Cpre c (V) = £ (V) U A(V). (1) 

Intuitively, Cpre c (V) returns the set of energy levels from which V\ can force an energy level in V in 
one step. The operator Cpre c is C-monotone over the complete lattice U(C), and so there exists a greatest 
fixed point for Cpre c in the lattice U(C), denoted by CpreJ. As usual, the greatest fixed point of the operator 

4 Note that the symbolic algorithm can be applied to MEPGs and MMPPGs after removal of the parity condition 
by applying the construction of Lemma 4. 
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Cpre c can be computed by successive approximations as the last element of the following finite C-desccnding 
chain. We define the algorithm CpreFP that computes this greatest fixed point: 



The set Ui contains all the energy levels that are sufficient to maintain the energy positive in all dimensions 
for i steps. Note that the length of this chain can be bounded by [7(C) | and the time needed to compute each 
element of the chain can be bounded by a polynomial in |[7(C)|. As a consequence, we obtain the following 
lemma. 

Lemma 7. Let G = (Si, S 2 , S init}-E, k,w) be a multi energy game and C G N be a constant. Then Cpre^ can 
be computed in time bounded by a polynomial in \U(C)\, i.e., an exponential in the size of G. 

Symbolic representation. To define a symbolic representation of the sets manipulated by the Cpre c 
operator, we exploit the following partial order: let (s, e), (s' , e') G U(C), we define 



A set V G U(C) is closed if for all (s,e),(s',e') G [7(C), if (s, e) G V and (s, e) ± (s',e'), then (s',e') G V. 
By definition of Cpre c , we get the following property. 

Lemma 8. All sets Ui in eq. (2) are closed for 

Therefore, all sets [7, in the descending chain of eq. (2) can be symbolically represented by their minimal 
elements Min^ (Ui) which is an antichain of elements for Even if the largest antichain can be exponential in 
G, this representation is, in practice, often much more efficient, even for small values of the parameters. For 
example, with C = 4 and k = 4, we have that the cardinality of a set can be as large as |[7,| < 625 whereas 
the size of the largest antichain is bounded by |Min-<([7,)| < 35. Antichains have proved to be very effective: 
see for example [1,21,22]. Therefore, our algorithm is expected to have good performance in practice. 

Correctness and completeness. The following two lemmas relate the greatest fixed point Cpre c and the 

existence of winning strategies for V\ in G. We start with the correctness of the symbolic algorithm. 

Lemma 9 (Correctness). Let G = (Si, S2, Sjmt, E, k, w) be a multi energy game, let C G N be a constant. 
If there exists (ci, . . . , Cfc) G N fc s.t. (si n u, (ci, . . . , Cfc)) G Cpre c , then V\ has a winning strategy in G for 
initial credit (ci,...,Cfc) and the memory needed by V\ can be bounded by |Min^(CpreJ)| (the size of the 
antichain of minimal elements in the fixed point). 

Given the set of winning initial credits output by algorithm CpreFP, it is straightforward to derive a 
corresponding winning strategy of at most exponential size. Indeed, for winning initial credit c G N k , we 
build a Moore machine which (i) states are the minimal elements of the fixed point (antichain at most 
exponential in G), (ii) initial state is any element (t,u) among them s.t. t — s, n , t and u < c, (iii) next-action 
function prescribes an action that ensures remaining in the fixed point, and (iv) update function maintains 
an accurate energy level in the memory. 

Proof. We denote by c the fc-dimensional credit vector (ci,...,Cfe). W.l.o.g. we assume that states of G 
alternate between positions of V\ and positions of V2 (otherwise, we split needed edges by introducing 
dummy states). From Cprejj-, we construct a Moore machine M = (Q M , qff , A M , Act M ) which respects the 
following definitions: 

— Q M = Min^{(£, u) G Si x {0 . . . C}* | (t, u) G (Cpre^)}. The set of states of the machine is the antichain 
of ^-minimal elements that belongs to Vi in the fixed point. Note that the length of this antichain is 
bounded by an exponential in the size of the game. 

— Qq 1 is any element (t, u) in Q M s.t. t = s init and u < c. Note that such an element is guaranteed to exist 
as (s mlt ,c) G Cp^. 



7/ = C7(C), Ui = Cpre c ([7 ), ...,U n = Cpre c ([7 n _ 1 ) = U n - X . 



(2) 



(s, e) ^ (s', e') iff s = s' and e < e' . 



(3) 



13 



— For all (t,u) G Q M , we define Act ((t, u)) by choosing any element (t, t') G E s.t. there exists (t',u') G 
CpreJ with u' = u + w(t,t'). Such an element is guaranteed to exist by definition of Cpre c and the fact 
that (t, u) G CpreJ. 

— A M : Q M x ((S 2 x S) n E) i-> Q M is any partial function that respects the following constraint: if 
Act M ((i, u)) = (t, t 1 ) then A M ((t, u), (t' , t")) is defined for any (*', t") G E and can be chosen to be equal 
to any (t" ,u") s.t. u" < u + w(t,t') + w(t',t"), and such an u" is guaranteed to exist by definition of 
Cpre c and because Cprec is a fixed point. 

Now, let us prove that for any initial prefix sqsi . . . S2n of even length in G, which is compatible with M, we 
have that c + EL(sosi . . . S2 n -i) > and c + EL(sosi . . . s 2n ) > 0. To establish this property, we first prove 
the following property by induction on n: c + EL(s si . . . s 2n ) > u where u is the energy level of the label of 
the state reached after reading the prefix soSi . . . S2n with the Moore machine M. Base case n — is trivial. 
Induction: assume that the property is true for n — 1, and let us establish it for n. By induction hypothesis, 
we have that c + EL(s si . . . S2(«-i)) > u where u is the energy level of the label of state q that is reached 
after reading sqSi . . . s 2 ( n -i) with the Moore machine. Now, assume that Act M (g) = (i,f). So, 

s 2(n-l) = t 

and the choice of Vi is to play (t,t'). So, S2< n -i)+i = f. Now for all possible choices {t',t") of V2, we know 
by definition of M that the energy level u" that labels the state A M (q, (t',t")) is u" < u + w(t,t') + w(t' ,t"), 
which establishes our property. Therefore, the strategy of V\ based on M is s.t. the energy always stays 
positive for initial credit c, which concludes the proof. □ 

Completeness of the symbolic algorithm is guaranteed when a sufficiently large threshold C is used as 
established in the following lemma. 

Lemma 10 (Completeness). Let G = (Si, S2, Si n u, E,k,w) be a multi energy game in which all absolute 
values of weights are bounded by W . IfVi has a winning strategy in G and T = (Q, R) is a self-covering tree 
for G of depth I, then (s init , (C, . . . , C)) G CpreJ for C = 2 • l-W. 

Remark 1. This algorithm is complete in the sense that if a winning strategy exists for Vi, it outputs at least 
a winning initial credit (and the derived strategy) for C = 2 • I ■ W. However, this is different from the fixed 
initial credit problem, which consists in deciding if a particular given credit vector is winning and is known 
to be EXPSPACE-hard [10,27]. In general, there may exist winning credits incomparable to those captured 
by algorithm CpreFP. 

Proof. To establish this property, we first prove that from the set of labels of T, we can construct a set 
/ which is increasing for the operator Cpre c , i.e., Cpre c (/) D f, and s.t. (s init , (C, . . . , C)) G /. Wc define 
/ from T = (Q,R) as follows. Let C G N be the smallest non-negative integer s.t. for all q G Q, with 
0(q) = (t,u), for all dimensions i, 1 < i < k, we have that u(i) + C > 0. C is bounded from above by 
I ■ W because on every path from the root to a leaf in T, every dimension is at most decreased I times by 
an amount bounded by W, and at the root all the dimensions are equal to 0. For any q G Q, we denote 
by 0(q) + C the label of q where the energy level has been increased by C in all the dimensions, i.e., if 
0(q) = (t, u) then 0(g) + C = (t, u + (C, . . . , C)). Note that for all nodes in Q, the label is at most I ■ W and 
thus the shifted label remains under C = 2 • I ■ W. Now, we define the set / as follows: 

/ = {(t, u) G U(C) \3qeQ, 9(q) +C<(t,u)}. (4) 

So, / is defined as the ^-closure of the set of labels in T shifted by C in all the dimensions. 

First, note that (si n u, (C, . . . , C)) G / as the label of the root in T is (si n u, (0, . . . , 0)). Second, let us 
show that Cpre c (/) D f. Take any (t,u) G / and let us show that (t,u) G Cpre c (/). We decompose the 
proof in two cases. (A) t G Si. By definition of /, there exists q G Q s.t. 0(q) + C ■< (t,u). W.l.o.g. we can 
assume that q is not a leaf as otherwise there exists an ancestor q' of q s.t. 0(q') ^ O(q) (recall the set is 
described by its minimal elements). By definition of T, there exists (t, t') G E and q' G Q s.t. (q, q') G R and 
G(q') = &(q) + w(t, t'). Let (t' , v) = G(q') + C. By definition of /, we have (t 1 , v) G /. By cq. (1), it follows 
that (t, u) G Cpre c (/). (B) t G S2. By definition of /, there exists q G Q s.t. G(q) + C ^ (t, u). Again, w.l.o.g. 
we can assume that q is not a leaf as otherwise there exists an ancestor q' of q s.t. G(q') < G(q). By definition 
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of T, for all (t, t 1 ) £ E, there is q' e Q s.t. (q, q') £ R and 0(q r ) = 0(q) + w{t, t'). Let (*', v) = O(q') + C. By 
definition of /, we have (t',v) E f. By cq. (1), it follows that (t,u) G Cpre c (/). 

Now, let us show that / C CpreJ. This is a direct consequence of the monotonicity of Cpre c : it is 
well known that for any monotone function on a complete lattice, its greatest fixed point is equal to the 
least upper bound of all points in which it is increasing, i.e., CpreJ. = lj{e | e C Cpre c (e)} 3 /. Since 
(sinit, (C, . . . , C)) e /, that concludes the proof. □ 

Remark 2. Note that the exponential bound on memory, obtained in Lemma 5, can also be derived from the 
Moore machine construction of Lemma 9 as this method is complete according to Lemma 10. Still, the DAG 
construction of Lemma 5 is interesting in its own right, and introduces the concept of node merging, which 
is underlying to the symbolic algorithm correctness, while transparent in its use. 

Incrementality. While the threshold 2 • I ■ W is sufficient, it may be the case that V\ can win the game 
even if its energy level is bounded above by some smaller value. So, in practice, we can use Lemma 9, to 
justify an incremental algorithm that first starts with small values for the parameter C and stops as soon as 
a winning strategy is found or when the value of C reaches the threshold 2 • I ■ W and no winning strategy 
has been found. 

Application of the symbolic algorithm to MEPGs and MMPGs. Using the reduction of Lemma 4 
that allows us to remove the parity condition, and the equivalence between multi energy games and multi 
mean-payoff games for finite-memory strategies (given by [17, Theorem 3]), along with Lemma 7 (complexity), 
Lemma 9 (correctness) and Lemma 10 (completeness), we obtain the following result. 

Theorem 2 (Symbolic and incremental synthesis algorithm). Let G p be a multi energy (resp. multi 
mean-payoff) parity game. Algorithm CpreFP is a symbolic and incremental algorithm that synthesizes a 
winning strategy in G p of at most exponential size memory, if a winning (resp. finite-memory winning) 
strategy exists. In the worst-case, the algorithm CpreFP takes exponential time. 

Proof. The correctness and completeness for algorithm CpreFP on multi energy games are resp. given by 
Lemma 9 and Lemma 10. Extension to mean-payoff games (under finite memory) is given by [17, Theorem 
3], whereas the parity condition can be encoded as energy thanks to Lemma 4. Exponential worst-case 
complexity of the algorithm CpreFP is induced by Lemma 7. □ 

5 Trading finite memory for randomness 

In this section, we answer the fundamental question regarding the trade-off of memory for randomness in 
strategies: we study on which kind of games V\ can replace a pure finite-memory winning strategy by an 
equally powerful, yet conceptually simpler, randomized memoryless one and discuss how memory is encoded 
into probability distributions. Note that we do not consider wider strategy classes (e.g., randomized finite- 
memory), nor do we allow randomization for V2 (which on most cases is dispensable anyway). Indeed, our 
aim is better understanding of the underlying mechanics of memory and randomization, in order to provide 
alternative strategy representations of practical use; not exploration of more complex games with wider 
strategy classes (Lemma 19 shows a glimpse of it). 





Multi energy 


Energy parity 


Multi MP (parity) 


MP parity 


one-player 


X 


X 


V 


V 


two- player 


X 


X 


X 


V 



Table 1. When pure finite memory for Vi can be traded for randomized memory lessness. 
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We present an overview of our results in Tab. 1 and summarize them in Theorem 3. Note that we do not 
consider the opposite implication, i.e., does there always exist a way of encoding a randomized memoryless 
strategy into an equivalent finite-memory one. In general, this is not the case even for classes of games where 
we can trade memory for randomness, and it can easily be witnessed on the one-player multi mean-payoff 
game depicted on Fig. 4. Indeed, expectation (1,1) is achievable with a simple uniform distribution while it 
is not achievable with a pure, arbitrary high memory strategy (even infinite). 



1 




Fig. 4. Randomization can replace memory, but not the opposite. 



We break down these results into three subsections: energy games, multi mean-payoff (parity) games, 
and single mean-payoff parity games. We start with energy games. 

5.1 Randomization and energy games 

Randomization is not helpful for energy objectives, even in one-player games. The proof argument is obtained 
from the intuition that energy objectives are similar in spirit to safety objectives. 

Lemma 11. Randomization is not helpful for almost-sure winning in one-player and two-player energy, 
multi energy, energy parity and multi energy parity games. 

Proof. Let G p be a game fitted with an energy objective. Consider an almost-sure winning strategy Ai. 
If there exists a single path tt consistent with Ai that violates the energy objective, then there exists a 
finite prefix witness p to violate the energy objective. Moreover, as the finite prefix has positive probability 
(otherwise the play is not consistent), and the strategy Ai is almost-sure winning, it follows that no such path 
exists. In other words, Ai is a sure winning strategy. Since randomization does not help for sure winning 
strategy, it follows that randomization is not helpful for one-player and two-player energy, multi energy, 
energy parity and multi energy parity games. □ 

5.2 Randomization and multi mean-payoff (parity) games 

Randomized memoryless strategies can replace pure finite-memory ones in the one-player multi mean-payoff 
parity case, but not in the two-player one, even without parity. We first note a useful link between satisfaction 
and expectation semantics for the mean-payoff objective. 

Lemma 12. Let G = (Si, S2, Sinn, E, k 7 w) a game structure with objective 4> = MeanPayofF G (v) for some 
threshold vector v G Q k . Let Ai G A\ be a strategy ofV\. If \\ is almost-sure winning for <j) (i.e., winning 
for 1- satisfaction), then Ai is also winning for v- expectation for the mean-payoff function MP. The opposite 
does not hold. 

Proof. We first discuss the claimed implication. Suppose 1-satisfaction is verified. Then, for all strategy 
A2 G A2 of V2, the set of consistent plays of value > v has measure 1, while the one of value < v has measure 
0, by definition. Therefore, the expectation E^^ 2 (MP) is at least v and w-expectation is verified. 

To show that the opposite does not hold, consider the simple one-player game depicted on Fig. 4. Let Ai 
be a simple coin flipping on s 1; i.e., Ai(si)(s 2 ) = 1/2, Ai(si)(s 3 ) = 1/2, Ai(s 2 )(s 2 ) = 1 and Ai(s 3 )(s 3 ) = 1. 
The expectation of this strategy is v = (1, 1). Nevertheless, the probability of achieving mean-payoff of at 
least v is 1/2 < 1, which shows that it does not verify 1-satisfaction for MeanPayoff G (w). □ 
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The fundamental difference between energy and mean-payoff is that energy requires a property to be 
satisfied at all times (in that sense, it is similar to safety), while mean-payoff is a limit property As a 
consequence, what matters here is the long-run frequencies of weights, not their order of appearance, as 
opposed to the energy case. 

Lemma 13. Pure finite-memory winning strategies can be traded for equally powerful randomized memo- 
ryless ones for one-player multi mean-payoff parity games, for both satisfaction and expectation semantics. 
For two-player games, randomized memoryless strategies are not as powerful, even limited to expectation 
semantics, no parity condition, and only 2 dimensions. 

For the one-player case, we extract the frequencies of visit for edges of the graph from the regular 
outcome that arises from the finite-memory strategy of V\ . We build a randomized strategy with probability 
distributions on edges that yield the exact same frequencies in the long-run. Therefore, if the original pure 
finite-memory of V\ is surely winning, the randomized one is almost-surely winning. For the two-player 
case, this approach cannot be used as frequencies are not well defined, since the strategy of Vi is unknown. 
Consider a game which needs perfect balance between frequencies of appearance of two sets of edges in a play 
to be winning (Fig. 5). To almost-surely achieve mean-payoff vector (0, 0), V\ must ensure that the long-term 
balance between edges (54, S5) and (S4, .s@) is the same as the one between edges (s\, S3) and (si, S2). This is 
achievable with memory as it suffices to react immediately to compensate the choice of Vi. However, given a 
randomized memoryless strategy of V\ , Vi always has a strategy to enforce that the long-term frequency is 
unbalanced, and thus the game cannot be won almost-surely by V\ with such a strategy. Achieving expected 
mean-payoff (0, 0) is also excluded. 




Fig. 5. Memory is needed to enforce perfect long-term Fig. 6. Mixing strategies that are resp. good for Biichi 

balance. and good for energy. 

Proof. We begin with the one-player case. Let G p be a multi mean-payoff parity game. Let A^ e /If F be the 
pure finite- memory strategy of the player. Since it is pure and finite, its outcome is a regular word tt = pi -p^ , 
with pi 6 S*, P2 G S + . Let <f> — MeanPayoff Gp (v) n Parity Gp be the multi mean-payoff parity objective for 
some threshold vector v 6 Q . Suppose this strategy verifies cv-satisfaction for (j> and /3-expectation for the 
MP function, for some a, [3. We claim that there exists a randomized memoryless strategy A™ G A^ M that 
is also a-satisfying for <f> and that satisfies /3-expectation for the MP function; and we show how to build it. 

We denote concatenation by the • symbol. Given a finite word p € S*, two states s,s' € S, we resp. 
denote by occ(s, p) and occ((s, s'), p) the number of occurences of the state s and the transition (s, s') in the 
word p. We add the subscript o when we count the first state of the word as the successor of the last one 
(i.e., the word is a cycle in the game graph). That is, occ (*,/o) = occ(*,p ■ First(p)). 

Let us consider the mean-payoff of the outcome of strategy A^ . Recall that for a play tt G Plays(G), 
7r = s 1 , s 2 , s 3 . . . , we have: 

MP(tt) = liminf - V w(s i ,s l+1 ). 

l<i<n 

Since the play induced by Af' is regular, the limit is well defined and we may express the mean-payoff in 
terms of frequencies, that is 

MP(tt)= w(s,s')-fre qoo (s,s'), 
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where freq^ denotes the long-term frequency of a transition defined as 

OCCo((s,s'),/9 2 ) 



V(s,s')eE, freq 00 (( a ,s , )) = 



We define the randomized memory less strategy A™ as follows: 

Va, a' G S, {s, s') 6 E, X = {{s, t) \ t G S, (s, t) G ( Pl • First(p 2 ))} , 



r i 

\x 



if s G pi A s G" p 2 , 



occo ((s,s'),p 2 ) . 



OCC(S,/0 2 ) 



otherwise. 



if s G P2 , 



Intuitively, we fix a uniform distribution over transitions of the finite prefix p\ as we only need to ensure 
reaching the end component defined by p 2 with probability 1, and the relative frequencies in p\ do not 
matter (because these weights and priorities are negligible in the long run). On the contrary, we use the 
exact frequencies for transitions of p2 as they will prevail long-term wise. Note that A™ constitutes a 
correctly defined randomized memoryless strategy. 

Obviously, A™ yields a Markov chain over states of (pi Up 2 ) s.t. states of (pi \p 2 ) arc transient and states 
of p 2 constitute an end component that is reached with probability one. Thus, the mean-payoff induced by 
X[ m is totally dependent on this end component mean-payoff value. As a consequence, proving that transition 
frequencies in the end component are exactly the same as frequencies freq^ defined by A^ will imply the 
claim on mean-payoff. Moreover, parity will remain satisfied as the sets of infinitely often visited states will 
be the same for both the pure and the randomized strategy. Let T — {t±,t2, ■ ■ ■ ,t m } be the set of states 
that appear in p 2 . This end component is an ergodic Markov chain M e = (T, P) with the following matrix 
of transition probabilities: 



P = 



"1 

/ occ ((t ll t 1 ) 1 p 2 ) 
occ(ii,p 2 ) 



OCC ((t m ,t m ),p 2 ) 

occ(t m ,p 2 ) / 



Classical analysis of ergodic Markov chains grants the existence of a unique probability vector v s.t. vP = v, 
i.e. 

OCC ((tj,ti),p 2 ) 



V 1 < i < m, Ui = 



occ (tj,p 2 ) 
l<j<m ' 

This vector v represents the occurence frequency of each state in an infinite run over the Markov chain. It 
is easy to see that the unique probability vector v that satisfies vP — v is 

occ(t 1} p 2 ) occ(t m ,p 2 ) 

\Pl\ ' "' ' \P2\ 

Moreover, given a transition of the Markov chain, its frequency is simply the product of the frequency of its 
starting state by the probability of the transition when the chain is in this state: for all i, t' G T, we have 
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freq^ e ((i,i')) = v(t) ■ P(t,t'). By definition of v and P, that is 

c M e fi4. 4-i\\ 0CC o ((t,t), pi) , 

fre qoo e ((i,t )) = — = freq^, ((t,t )), 

\Pi\ 

thus proving that the randomized strategy X\ m yields the same mean-payoff and parity as the pure finite- 
\pf 

memory one X\ . 

Now it remains to show that this does not carry over two-player games. Indeed, we show that randomized 
mcmorylcss strategics cannot replace pure finite-memory ones for the expectation semantics, even without 
parity. By Lemma 12, this implies that it cannot be verified for 1-satisfaction semantics cither. Consider 
the game depicted on Fig. 5. Player V\ has a pure finite-memory strategy X^ that ensures MP(7r) > (0,0), 
for all strategy A2 of Vi- This strategy is simply to take the opposite choice of P 2 : X\ 3 F (*s 2 S4) — sq and 

(*S3S4) = S5. Now suppose V\ uses a randomized memoryless strategy A™ s.t. A™(s4)(s5) = p and 
X\ m (s4)(sa) = 1 — p, for some p 6 [0,1]. We claim that whatever the value of p, there exists a counter- 
strategy A 2 for V 2 s.t. E£ ' 2 (MP) ^ (0,0). Suppose p > 1/2 and let A 2 (si) = s 2 . Then, we have 

eF NmP, = MLMlMia . j, (0,0). 

Now suppose p < 1/2 and let A 2 (si) = S3. Then, we have 

OP) = (-l.') + &>-(l.-lH(l-rt-(-'.l)l _ 1 ft, j, (0,0). 

This shows that memory is needed to achieve the (0, 0)-expectation objective and concludes our proof. □ 
5.3 Randomization and single mean-payoff parity games 

Randomized memoryless strategies can replace pure finite-memory ones for single mean-payoff parity games. 

Proof outline. We prove it in two steps. First, we show that it is the case for the simpler case of MP 
Biichi games. Suppose V\ has a pure finite-memory winning strategy for such a game. We use the existence 
of particular pure memoryless strategies on winning states: the classical attractor for Biichi states, and a 
strategy that ensures that cycles of the outcome have positive energy (whose existence follows from [14]). 
We build an almost-surely randomized memoryless winning strategy for V\ by mixing those strategies in the 
probability distributions, with sufficient probability over the strategy that is good for energy. We illustrate 
this construction on the simple game G p depicted on Fig. 6. Let A^ £ Af F be a strategy of V\ s.t. V\ 
plays (si,si) for 8 times, then plays (si,s 2 ) once, and so on. This strategy ensures surely winning for the 
objective <j> = MeanPayoff Gp (3/5). Obviously, Vi has a pure memoryless strategy that ensures winning for 
the Biichi objective: playing (si, s 2 ). On the other hand, he also has a pure memoryless strategy that ensures 
cycles of positive energy: playing (si,si). Let A™ <E ylf M be the strategy defined as follows: play (si,s 2 ) 
with probability 7 and (si, Si) with the remaining probability. This strategy is almost-surely winning for <j> 
for sufficiently small values of 7 (e.g., 7 = 1/9). Second, we extend this result to MP parity games using an 
induction on the number of priorities and the size of games. We consider subgames that reduce to the MP 
Biichi and MP coBiichi (where pure memoryless strategies are known to suffice [18]) cases. 

Biichi case. A particular, simpler case of the parity objective is the Biichi objective. It corresponds to 
parity with priorities {0, 1}. We denote a Biichi game by G = (Si, 5 2 , s initl E, w, F), with F the set of Biichi 
states s.t. that a play is winning if it visits infinitely often states of the set F. We first state results on 
these Biichi objectives, as they are conceptually simpler to understand. Proof arguments for parity are more 
involved and make use of results on Biichi objectives. 

We first introduce the useful notion of e-optimality. Given a game G p with a one-dimension 5 mean- 
payoff objective, we define its value as yal = sup AlG/ll inix 2 eA 2 { v I OutcomeG p (Ai, A 2 ) C MeanPayoff Gjj (v)}. 



5 The multi-dimensional setting gives rise to incomparable outcomes and the need to consider Pareto-optimality. 
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A strategy Ai £ A\ is said optimal for the mean-payoff objective if it achieves this value. Such a strategy 
may not need to exist in general, even in one-player games [18,9,15] (Fig. 7, V\ has to delay its visits of si for 
longer and longer intervals). However, it is known that for all e > 0, e-optimal strategies (i.e., that achieve 
value ( val — e)) always exist in one-dimension mean-payoff games, as a consequence of Martin's theorem on 
Borel determinacy [31]. 




Fig. 7. Mean-payoff Biichi requires infinite memory for Fig. 8. Stochastic process depicting alternation be- 

optimality. tween sequences of edges from \ f e and \ f F . 

Here, we show finite-memory strategies can be traded off for randomized memoryless ones for mean-payoff 
Biichi games. Precisely, we prove that e-optimality for mean-payoff Biichi games can as well be achieved by 
randomized memoryless strategies. We first need to state two useful lemmas granting the existence of pure 
memoryless strategies that are resp. good- for- energy or good-for-Biichi, in all states that are winning for the 
mean-payoff Biichi objective. These strategies will help us build the needed ^-optimal strategies. 

Lemma 14 (Extension of [14, Lemma 4]). Let G — (Si, 52, s initl E, w, F), with F the set of Biichi 
states. Let Win C S be the set of winning states for the mean-payoff Biichi objective with threshold 0. For all 
s £ Win, V\ has a uniform (i.e., independent of the starting state) memoryless good- for- energy strategy X^ e 
whose outcome never leaves the set Win, s.t. any cycle c of this outcome has energy EL(c) > 0. 

Lemma 15 (Classical attractor). Let G — (Si, S 2 , s init , E, w, F), with F the set of Biichi states. Let 
Win C S be the set of winning states for the mean-payoff Biichi objective with threshold 0. For all s £ Win, Vi 
has a uniform (i.e. independent of the starting state) memoryless good-for-Biichi strategy Xf F , an attractor 
strategy for F, whose outcome never leaves the set Win, s.t. it ensures reaching F in at most \S\ steps. 

The randomized memoryless strategy of Vi will thus consist in mixing these two strategies, with a very 
low probability on the good-for-Biichi strategy. Indeed, the Biichi objective will be satisfied whatever this 
probability is, provided it is strictly positive. On the other hand, by giving more weight to the good- for- energy 
strategy, Vi can obtain a mean-payoff that is arbitrary close to the optimum. 

Lemma 16. In mean-payoff Biichi games, e-optimality can be achieved surely by pure finite-memory strate- 
gies and almost-surely by randomized memoryless strategies. 

Proof. Let G = (Si, S2, Si n u,E, w, F), with F the set of Biichi states. We consider the mean-payoff objective 
with threshold (w.l.o.g.). Let Win C S be the set of winning states for the mean-payoff Biichi objective. By 
Lemma 14 and Lemma 15, for all s £ Win, Vi has two uniform memoryless strategies Xf e and Xf F , whose 
outcomes never leave the set Win, s.t. X^ e ensures that any cycle c of its outcome has energy EL(c) > 0, and 
Xf F , an attractor strategy for F, ensures reaching F in at most \S\ steps. 

We first build er-optimal pure finite-memory strategies based on these two pure memoryless strategies. 
Let e > 0. As usual, W denotes the largest absolute weight on any edge. Let us define X\ f s.t. (a) it plays 
Af^ e for ^^'l 5 ! — IS"! steps, then (b) it plays Xf F for \S\ steps, then again (a). This ensures that F is visited 
infinitely often as Xf F is played infinitely many times for \S\ steps in a row. Furthermore, the total cost of 
phases (a) + (b) is bounded by — 2 • W ■ \S\, and thus the mean-payoff of the outcome is at least — e, against 
any strategy of the adversary. 
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Second, we show that based on the same pure memoryless strategies, it is possible to obtain almost-surely 
e-optimal randomized memoryless strategies, i.e., 



Ve > 0, 3X[ m G /If , VA 2 G A 2 , 

\ rm \ \ rm \ 

KLu (Par(Tr) mod 2 = 0) = 1 A PsLu (MP(tt) > -e) = 1. 

Note that pure memoryless strategies suffice for P 2 as he essentially has to win against the Biichi or the 
mean-payoff criterion [9]. Therefore, given e > 0, we need to build some strategy A™ G /lf M s.t. 

VA?T G A PM , PsD Ar (Par(Tr) mod 2 = 0) = 1 A W^Zf™ (MP(tt) > -e) = 1. 
We build such a strategy as follows: 



V.s G S, A™(s) 



Af^ e (s) with probability 1 — 7, 
Xf F (s) with probability 7, 



for some well-chosen 7 e ]0, 1[. 

It is straightforward to see that the Biichi objective is almost-surely satisfied for all values of 7 > as at 
all times, the probability of playing according to Xf F for \S\ steps in a row, and thus ensuring a visit of F, 
is 7 1 5 ', which is strictly positive. 

It remains to study if choosing such a constant 7 s.t. the MeanPayoff Gp (— e) objective is almost-surely 
satisfied is always possible. Consider such a strategy A™ G Af- M and some fixed strategy A 2 m G A PM of V2 ■ 
the game reduces to a Markov chain M. c = (S, 6, w), where 8: E — > [0,1] is the transition probability function 

r>rn \^rm 

resulting from fixing those strategics. Suppose A 2 is winning for V 2 - Thus, PsLu 2 (MP(tt) < -e) > 0. 
The mean-payoff depends on limit behavior: the probability measure of plays that do not enter in an end 
component (EC) is zero [3], whereas in an EC of expected mean-payoff v, we have probability one of obtaining 
mean-payoff v. This implies that there exists some EC C in M c s.t. Pm c (QC) > and Ec (MP(7r)) < —e. 
We claim that it is possible to choose 7 s.t. all ECs, in all Markov chains induced by pure memoryless 
strategies of V2, have expectation greater or equal to e, thus proving that strategy A™ is almost-surely 
e-optimal with regard to mean-payoff. Intuitively, the smaller this constant 7 is chosen, the nearer will the 
expected mean-payoff induced by X[ m be to the one induced by Af^ e , that is zero. Since the number of pure 
memoryless strategies of Vi is finite, and so is the number of ECs induced by A™ (regardless of the exact 
value of 7 G ]0, 1[, we obtain the same ECs in terms of states and edges), one can compute a suitable 7 for 
each of them, and then take the mininum to ensure that the needed property will be satisfied in all possible 
cases. 

Therefore, let us fix some strategy A 2 m of V2, and some EC C of the induced Markov chain when played 
against strategy A™ of V\. It remains to show that there exists 7* G ]0, 1[ s.t. for all 7 < 7*, we have 
Ee(MP(7r)) > — e to conclude this proof. In C, all states bear two outgoing edges, one from Af^ e , and one 
from Xf F (we suppose w.l.o.g. that they are distinct), with respective probabilities 1 — 7 and 7. Consider 
the stochastic process M e depicting alternation between sequences of edges from Af /e and Xf F (Fig. 8). 

By definition of Xf e , a sequence of gfe edges of length k has its energy bounded below by — W ■ \ S\ (i.e., it 
does not depend on k). Indeed, recall that all cycles have positive energy. Thus, the energy level of a sequence 
is a sum of positive terms (cycles), plus a sum of at most \S\ terms bounded from below by — W, as having 
more than |5| edges produces cycles. Moreover, each (}F edge has energy bounded below by — W. Thus the 
overall mean-payoff for a play that consists of repeated sequences of k gfe edges followed by one (^F edge 
is _!mi^J_tD By putting more probability on lengthy sequences of gfe edges, we will thus be able to obtain 
an overall expected mean-payoff that is closer to zero, and particularly, greater or equal to — e. Indeed, we 
decompose the overall expected mean-payoff according to the length of gfe sequences before seeing a OF 
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edge. Let seq^ denote a sequence of a edges of length b. We have: 

oo 

E c (MP(tt)) = ^P(seq^seq^) • E(MP | seq^seq^), 

oo 

= E( 1 -^) fc T" IE ( Mp l se ^/e se q^)- 

k=0 

Now we divide this sum in two parts, according to some value k* s.t. for all k > k* , we have 
E(MP|seq^ /e seq^ F ) = - W "<S +1) > - ^l^ = -7? > -e. It suffices to take k* = to achieve 

this. Notice that the mean-payoff of a play is also trivially bounded below by —W, the largest negative 
weight on any edge. We obtain: 

k* — 1 oo 

E C (MPW) > ]T(1- 7 ) fe 7 • {-W) + J2 (! - 7)S ■ (-»?), 

fc=0 fc=fe* 

> fc*7- (-W) + (1 - fc*7) • (-??). 

Thus, one can achieve Ec(MP(7r)) > — e by choosing any 7 < 7* = p^rfh ■ Notice that such a 7* is indeed 
present in ]0, 1[ for sufficiently small values of e, independently of values of IS 1 ] and W. Since we are interested 
in e arbitrary close to zero, this concludes our proof. □ 

Parity Case. Given those results for mean-payoff Buchi games, we now consider the more general case of 
mean-payoff parity games. We start by introducing the useful notion of subgames. 

Subgame. Let G p — (Si, S2, Si n i tl E, k,w,p) be a game and A C S be a subset of states in G p . If E is 
such that for all s G A, there exists s' G A with (s, s') £ £7, then we define the subgame G p \, A as 
(Si n A, S*2 n A, £ fl (A x A), w',p') where w', p' are the functions w, p restricted to the subdomain A. Note 
that for subgames, we do not consider an initial state. 

Let G p = (Si, S2, Sinit, E, k, w,p) and U C 5. We define Attri(?7) as the set that is obtained as the limit 
of the following increasing sequence: Uq = U, and Ui = Ui-i U {s G Si \ 3 s' G (s, s') G S} U {s G S2 \ 

Vs', (s, s') E E, s' £ Ui-i}, for i > 1. As this sequence of sets is increasing, there exists i < \S\ such that 
Uj = U{ for all j > i. Attri(?7) contains all the states in G from which V\ can force a visit to U, and it is 
well known that Vi has a pure memory less strategy to force such a visit from those states. Also, it is clear 
that Vi does not have a strategy to leave the states in S\ Attri({7). Attractors can be defined symmetrically 
for V2 and are noted Attr 2 (-). As direct consequence, we have the following proposition. 

Proposition 1. Let G v = (Si,S2,Si n it,E,k,w,p) be a game, let U C S and Attri({7) be such that B = 
S \ Attri(i7) is non-empty, then G p I B is a subgame. 

The following lemma states that optimal pure memoryless strategies exist for Vi in games with mean- 
payoff coBiichi objectives (i.e., parity with priorities {1,2}). For mean-payoff Buchi objectives, we showed 
in Lemma 16 that, for all e > 0, e-optimal randomized memoryless strategies exist. 

Lemma 17 ([18, Theorem 5]). Let G v = (Si, S2, Si n n, E, k, w,p) be a game with priorities {1,2}, and 
Win> be the set of nodes in G p from which Vi wins the mean-payoff coBiichi objective for threshold 
(w.l.o.g.). Then from all states in WlN> Q; Vi has a pure memoryless winning strategy for the coBiichi 
mean-payoff objective for threshold 0. 

We now establish that e-optimal randomized memoryless strategies also exist for mean-payoff parity 
games, and thus, can replace pure finite-memory ones. 
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Lemma 18. Let G p = (S±, S 2 , Si n u, E,k,w,p) and Win> be the set of nodes in G p from which V\ wins 
the mean-payoff parity objective for threshold 0. Then for all e > 0, there exists A™ G Af M , s.t. for all 
s € WlN> and for all X 2 £ A 2 , we have that: 

Fs rM (MP(tt) < -e) = 1 A P^ m A2 (Par(Tr) mod 2 = 0) = 1. 

Proof. The proof is by induction on the lexicographic order ^ on games, defined as follows: Gp ^ Gp if Gp 
has less priorities than Gp or Gp has the same priorities than in Gp but less states. Clearly, this lexicographic 
order is well-founded. 

The base cases are twofold: one for the number of states, and one for priorities. First, if the game is 
such that \S\ = 1, then obviously, if V\ can win, he can do so with a pure memory less strategy, which 
respects the claim. Second, for two priorities. W.l.o.g., we can assume that all priorities are either in {0, 1} 
or in {1,2}. Those cases resp. correspond to mean-payoff Biichi and mean-payoff coBiichi games. The result 
for mean-payoff Biichi games has been established in Lemma 16, while the result for mean-payoff coBiichi 
games is a direct consequence of Lemma 17, as pure memoryless strategies are a special case of randomized 
memoryless strategies. 

Let us now consider the inductive case. Suppose we have a mean-payoff parity game G p with m priorities 
and \S\ states. W.l.o.g., we can make the assumption that the lowest priority in G p is either or 1, otherwise 
we subtract an even number to all priorities so that we are in that case. Let Uo = {s 6 Win> | p(s) = 0} 
and Ui = {s E Win> | p(s) = 1}. We consider the two possible following situations corresponding to Uo 
empty or not. 

1. Uo empty. In that case U\ is not empty. Let us consider A 2 = Attr2(£/i) the attractor of V 2 for U\. It 
must be the case that Win> \ A 2 is non-empty, otherwise this would contradict the fact that V\ is 
winning the parity objective from states in Win> . Indeed, if it was not the case, then V 2 would be 
able to force an infinite number of visits to U\ from all states in Win> , and the parity would be odd 
as Uo is empty, a contradiction with the definition of Win> . (i) Let B = Win> \ A 2 . First note that, 
as B is non-empty, by Proposition 1, G p 4- B is a subgame. Also, note that from all states in B, it 
must be the case that V\ has a winning strategy that does not require visits of the states outside B, 
i.e., states in A 2l for otherwise this would lead to a contradiction with the fact that V\ is winning the 
parity objective in Win> . So all states in the subgame G p I B are winning for V\. The game G p \. B 
does not contain states with priority 0, and so we can apply our induction hypothesis to conclude that 
Vi has a memoryless randomized strategy from all states in B, as (G p IB) ^ G p since it has one less 
priority, (ii) Now, let us concentrate on states in A 2 . Let A\ = Attri(B). From states in A\, V\ has a 
pure memoryless strategy to reach states in B, and so from there V\ can play as in G p \, B, and we are 
done. Let C = A 2 \ A\. If C is empty, we are done. Otherwise, by Proposition 1, G p 4- C is a subgame (V 2 
can force to stay within C) . We conclude that all states in this game must be winning for V\ . This game 
has the same minimal priority than in the original game (i.e., priority 1) but it has at least one state 
less, and so we can apply our induction hypothesis to conclude that V\ has a memoryless randomized 
strategy from all states in C. Therefore, by (i) and (ii), V\ has a memoryless randomized strategy from 
all states in Win^, , which proves the claim in that case. 

2. Uo is not empty. Let us consider A\ = Attri([/o). (Hi) First, consider the case where A\ = Win> . In this 
case, it means that V\ can force a visit to states in Uo from any states in WiN> . So, we conclude that 
V\ wins in G p the mean-payoff Biichi game with threshold 0, and by Lemma 16, we conclude that V\ 
has a memoryless randomized strategy from all states in G p for almost surely winning the parity game 
with mean-payoff threshold so we are done, (iv) Second, consider the case where B = Win> \ A\ is 
non-empty. Then by Proposition 1, G p \. B is a subgame. So V 2 can force to stay within B in the original 
game and so we conclude that all states in the game G p 4. B are winning for V\. As G v \. B does not 
contain states of priority 0, and thus has at least one less priority, we can apply the induction hypothesis 
to conclude that V\ has a memoryless randomized strategy from all states in B. Therefore, by (Hi) and 
(iv), V\ has a memoryless randomized strategy from all states in Win> , which also proves the case. 

As we have proved the claim in both possible cases, this concludes the proof. □ 
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5.4 Summary for randomization 

We sum up results for these different classes of games in Theorem 3 (cf. Table 1). 

Theorem 3 (Trading finite memory for randomness). The following assertions hold: (1) Randomized 
strategies are exactly as powerful as pure strategies for energy objectives. Randomized memoryless strategies 
are not as powerful as pure finite-memory strategies for almost-sure winning in one-player and two-player 
energy, multi energy, energy parity and multi energy parity games. (2) Randomized memoryless strategies 
are not as powerful as pure finite-memory strategies for almost-sure winning in two-player multi mean-payoff 
games. (3) In one-player multi mean-payoff parity games, and two-player single mean- pay off parity games, if 
there exists a pure finite-memory sure winning strategy, then there exists a randomized memoryless almost- 
sure winning strategy. 

Proof. (1) For energy games, results follow from Lemma 11. (2) For two-player multi mean-payoff games, 
they follow from Lemma 13. (3) For one-player multi mean-payoff games, they follow from Lemma 13. For 
two-player single mean-payoff parity, they are direct consequence of Lemma 18. □ 

We close this section by observing that there are even more powerful classes of strategies. Their study, 
as well as their practical interest, remains open. 

Lemma 19. Randomized finite-memory strategies are strictly more powerful than both randomized memo- 
ryless and pure finite-memory strategies for multi-mean payoff games with expectation semantics, even in 
the one-player case. 



(L-l) (-1,1) 




Fig. 9. Randomized finite memory is strictly more powerful than randomized memorylessness and pure finite memory. 

The intuition is essentially that memory permits to achieve an exact payoff by sticking to a given side, 
while randomization permits to combine payoffs of pure strategies to achieve any linear combination in 
between. 

Proof. Consider the game G depicted on Fig. 9. Whatever the pure finite- memory strategy of V\, the only 
achievable mean-payoff values are (1,-1) (if (so,si) is never taken) and (—1,1) (if (so, s i) is taken). This 
is also true for randomized memoryless strategics: either the probability of (so,si) is null and the mean- 
payoff has value (1,-1), or this probability is strictly positive, and the mean-payoff has value (—1, 1) as 
the probability mass will eventually reach s\. On the contrary, value (0,0) is achievable by a randomized 
finite-memory strategy. Indeed, consider the strategy that tosses a coin in its first visit of s to decide if it 
will play always play (s , so) or if it will play (s , si) and then always (s\,si). This strategy only needs one 
bit of memory and one bit to encode probabilities, and still, it is strictly more powerful than any amount of 
pure memory or any arbitrary high precision for probabilities without memory. □ 

6 Conclusion 

In this work, we considered the finite-memory strategy synthesis problem for games with multiple quantitative 
(energy and mean-payoff) objectives along with a parity objective. We established tight (matching upper 
and lower) exponential bounds on the memory requirements for such strategies (Theorem 1), significantly 
improving the previous triple exponential bound for multi energy games (without parity) that could be 
derived from results in literature for games on VASS. We presented an optimal symbolic and incremental 
strategy synthesis algorithm (Theorem 2). Finally, we also presented a precise characterization of the trade-off 
of memory for randomness in strategies (Theorem 3). 
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A Technical appendix 
A.l Proof of Lemma 5 

We first introduce some notations. Let T = (Q,R) be a self-covering tree (i.e., epSCT without the parity 
condition). We define the partial order < on Q s.t. for all ft, ft G Q s.t. @(ft) = (ii, iti) and <9(ft) = (£2, ^2), 
we have ft < ft iff t\ = ii and u\ < «2- We denote the equivalence by ~ s.t. ft ~ ft iff ft If: £2 and 
ft d: ft- For all q G Q, let Anc and EnAnc resp. denote the set of ancestors and energy ancestors of <r in 
T: Anc(<r) = {■& G Q \ {q} I d N 3<^}, where we use the classical CTL notation to denote that there exists a 
path from 1? to q in T, and EnAnc(ft) = {d G Anc(<r) | 1? ^ ft}. 

We build a sequence of DAGs (Di) <i< n = D = T, D\, D 2 , ■ ■ ■ , D n s.t. for all < i < n, Di is obtained 
from by merging two equivalent nodes of the same minimal level (i.e., closest to the root) of T)i-\. 

The sequence stops when we obtain a DAG D n = (Q n ,R n ) s.t. for all level j of D n , there does not exist 
two distinct equivalent nodes on level j. This construction induces merges by increasing depth, starting with 
level one. Moreover, if a DAG Di of the sequence is the result on merges up to level j, then it has the tree 
property (i.e., every node has a unique father) for levels greater than j. As the depth and the branching 
degree of T are finite, the defined sequence of DAGs is finite (and actually bounded). 

Let us give a formal definition of the merge operation. Consider such a DAG Di — (Qi,Ri). Let j the 
minimal level of Di that contains two equivalent nodes. Let ft, ^ € Qi{j) (i- c -> nodes of level j) be two nodes 
s.t. ft 7^ ft. and ft ~ ft. We suppose w.l.o.g. an arbitrary order on nodes of the same level so that ft, ft are 
the two leftmost nodes that satisfy this condition. We define Dj+i = (Qi+i, Ri+i) = merge(Di) as the result 
of the following transformation: 

- Q l+ i =Qi\ ({ft} U {ft G Q, I ft e Anc(ft)}), 

- R l+l = {Ri n (Q i+1 x Q i+1 )) U {(i?,ft) I (tf,ft) G Ri}. 

Thus, we eliminate the subtree starting in ft and replace all edges that point to ft by edges pointing to ft. 
This follows the idea that the same strategy can be played in ft as in ft since the present state and the 
energy level are the same. 
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Let Di = (Qi,Ri) be a DAG of the sequence (£>i)o<i<«- Given q G Qi, i? € Anc(<r), we denote by $ ~» c 
an arbitrary downward path from i? to ? in Given a leaf ^ £ ft, we denote its oldest energy ancestor 
by oea(<r). Recall that a strategy is described by such a DAG according to moves of a pebble. Given a leaf 
C G Qi and one of its energy ancestors $ G EnAnc(<r), we represent the pebble going up from q to ■& by <r O i?. 
Given a, ft G (Qi)*, a (5 (3 naturally extends this notation s.t. we have Last(a) First(/3). We consider 
energy levels of paths in the tree by refering to their counterparts in the game. Note that given #, <j G Qi, 
0(d) = (t, u), @(<t) = (f', m'}, we have EL(i9 <j) = u' — u. We start with two useful lemmas. 

Lemma 20. Let Di — (Qi,Ri) be a DAG of (£>j)o<i<n- For all nodes ft, ft £ Qi s.t. ft ~ ft 7 we /icwe fftaf 
Vtf g ^nc(ft) n 4nc(ft), Ei(<? ft) = EL{d ft). 

Proof. The proof is straightforward. □ 

Lemma 21. Lei -Dj = (Qi,Ri) be a DAG of (-Di)o<i<n- Let <;,•&, v, £ £ 6e /our nodes s.t. <; and £ are 
leafs, v is the deepest common ancestor of s and and $ is an ancestor of v. Let the oldest energy ancestor 
of £ be an ancestor o/ft i.e., oea(£) G Anc(q). We have that EL(d ~» <;) < EL(-d v ^ £ O oea(£) ^ <;). 

This lemma states that we can extract pebble cycles, which have positive energy levels, from a given 
path, in order to obtain some canonical path whose energy level is lower or equal (Fig. 10). 




Fig. 10. Lemma 21: cycles have positive energy levels. 



Proof. Let x = oea (£) an d p = i?^v«(Ox v| ^ Since \ G Anc(<^) n Anc(£), we have \ G Anc(z^) U {v}. 
Therefore, and applying Lemma 20, four cases are possible: x £ Anc(i?), x — $i X £ Anc(f) \ (Anc(i9) U {$}), 
and x = v. Consider the first case, \ S Anc(i9). Then ^il^y^^o^iJ^j/v)^ We have 
EL(p) = EL(tf + EL(i/ £) + EL(x i?) + EL(i? i/) + EL(z/ ?) = EL(x ~» i? ~~> v ~~> + EL(t9 ~> 
By definition of x = oea(£), the first term is positive. Thus, EL(p) > EL($ ~~> q). Arguments are similar for 
the other cases. □ 

We proceed with the proof of Lemma 5. 

Proof (Lemma 5). Let (Di)o<i< n be the sequence of DAGs defined above. We claim that (i) each DAG 
describes a winning strategy for the same initial credit, (ii) each DAG has the same depth I, and (iii) the 
last DAG of the sequence has its width bounded by \S\ ■ (2 • I ■ W + l) k . 

(i) First, recall that V\ can play a strategy Xf G Af F based on edges taken by a pebble on T. Notice that 
moving the pebble as we previously defined is possible because nodes belonging to V\ have only one child, 
and nodes of V 2 have childs covering all his choices once, and only once. Fortunately, the merge operation 
maintains this property. Therefore, it is straightforward to see that V\ can also play a strategy Af i G A PF 
for a DAG Di resulting of some merges on T. However, while this would be a valid strategy for V\, we have 
to prove that it is still a winning one, for the same initial credit vo as Af. Precisely, we claim that Mi > 0, 
we have that Af* is winning for v . 
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We show it by induction on Di. The base case is trivial as Do = T: the strategy A^ is winning for vo 
by definition. Our induction hypothesis is that our claim is valid for Di_i, and we now prove it for Di, by 
contradiction. Let ft, ft G Qi-i(j) be the merged nodes, for some level j of Di-\. Suppose Af 4 is not winning 
for vq. Thus there exists a finite path ( of the pebble in Di, which corresponds to a strategy \®* <E A% F of 7ft, 
s.t. it achieves a negative value on at least one dimension m, 1 < m < k. We have that (v n + EL(£)) ( m ) < 0. 
We aim to find a similar path r\ in Di_i s.t. EL(ry) < EL(£), thus yielding contradiction, as it would witness 
that A^ 1-1 is not winning for Vo- 

We denote by <^ m the father of ft in The only edge added by the merge operation is (<? m ,ft)- 

Obviously, if £ does not involve this edge, then we can take r\ = ( and immediately obtain contradiction. 
Thus, we can decompose the witness path 

C = a(l) <; m ft ft{l) a(2) <; m ft /3(2) O . . . a(«) ? m ft £, 

for some g > 1 s.t. for all 1 < p < q, we have that a(p),ft(p),£ e (Qi U {O})* arc valid paths 
of the pebble in Z?i (and they do not involve edge (<j m ,ft), i.e., {Vmft} $2 a (p)i P(p), £; and 

n (Anc^fcm) \ Ancr^Ori)) = 0, Last(/3(p)) is a leaf and oea(Last(/3(»)) e AncD 4 (? m ). 

Intuitively, C is split into several parts in regard to q, the number of times it takes the added edge (<? m ,ft)- 
Each time, this transition is preceded by some path a. It is then followed by some path ft where all visited 
ancestors of <^ m were already ancestors of ft in (thus, ft paths can be kept in rj). Finally, after the q-th 

transition <^ m ft is taken, the path £ ends with a finite sub-path £. 

We define the witness path 77 in £>j_i as 

n = n(i)ft(i) o n(2)ft(2) a ■ ■ . o 

with the following transformation of sub-paths a(p) <? m ft: 

- k(1) = r ^Di-i ft, 

- V2 < p < q, k( P ) = oea(Last(/3(p - 1))) ft, 

where denotes a valid path in £>j_i. Note that given preceding definitions, this indeed constitutes a 

valid path in We have to prove that 

ELfa) < EL(C). 

We have 

EL(r?)= 2 EL(«(p))+ E EL(/3(p)) + EL(£), 

1<P<9 1<P<9-1 

and 

EL(C)= J2 EL("W^mft)+ E EL(/3(p)) + EL(C). 

1<P<<? 1<P<9-1 

Thus, it remains to show that 

E EL Wp))< E EL(a(p)? mft ). 

i<p<g i<p<g 

In particular, we claim that for all 1 < p < q, we have EL(«(p)) < EL(a(p) <r m ft)- Indeed, notice that k(p) 
and a(p) share their starting and ending nodes and that a(p) contains a finite number of pebble cycles. Let 1? 
denote the common starting node of both n(p) and a(p). Applying Lemma 21 on a(p), we can eliminate cycles 
one at a time, without ever increasing the energy level, and obtain a path z9 <r m ft s.t. EL(i9 Tmft) < 
EL(a(p)). Since ft ~ ft, we have by Lemma 20 that EL($ ^>£> ; ? m ft) = EL(i9 <? m ft) = EL(i9 ~».Di_i ft), 

implying the claim. 

Consequently, we obtain EL(t7) < EL(£), which witnesses that Dj_i was not winning. This contradicts 
our induction hypothesis and concludes our proof that for all < i < n, X^* is winning for Vq. 

(ii) Second, the merge operation only prunes some parts of the tree T, without ever adding any new 
state, and added edges are on existing successive levels. Therefore, each D t has noticeably the same depth I. 
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(iii) Third, the last DAG of the sequence, D n , is s.t. for all level j, for all <Ji,£2 G Q n {j), we have 
(<Ti ^ T2) => (<Ti 9^ £2)- Therefore the width of this DAG is bounded by the number of possible non-equivalent 
nodes. Recall that two nodes are equivalent if they have the same labels, i.e., they represent the same state 
of the game and are marked with exactly the same energy level vector. Since the maximal change in energy 

level on an edge is W, and the depth of the DAG is bounded by I = 2 ( - d - 1 >\ s \ • (W ■ \S\ + l) c ' fe2 thanks to 
Lemma 3, we have possible vectors in {—I ■ W, — I ■ W + 1, . . . , I ■ W — 1, 1 ■ W} k for each state. Consequently, 
the width of D n is bounded by 

\S\ ■ (2 • l-W + l) k = \S\ ■ (2 d 'l s l • (W ■ \S\ + l) c ' fe2 • W + I)* , 
which is still single exponential. □ 
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